================================================================================
README.txt - Figure 2
================================================================================

### PROJECT TITLE
Matrix Product Approximation: Comparative Analysis of Sampling Algorithms and Theoretical Bounds (Refactored Version)

### OBJECTIVE
This project implements and compares various algorithms for approximating the matrix product A @ B.T, where A and B share a common dimension. It focuses on evaluating the empirical performance (relative error) of different column/feature selection and sketching techniques against their theoretical error bounds. The refactored version separates core functionalities into a library for better organization and reusability.

### FILE STRUCTURE
The project consists of two main Python files:

1.  `matrix_product_approximations_exp2.py`:
    *   This is a library file containing all core functionalities.
    *   Includes:
        *   Utility functions (norm calculations, SVD, filename sanitization).
        *   Matrix generation functions (e.g., Uniform, Gaussian, Row Orthogonal, Repeated Columns, Nonlinear transformations).
        *   Implementations of matrix product approximation algorithms:
            *   Standard Sampling: Uniform, Leverage Score (RMM), Deterministic.
            *   Custom/Advanced: Greedy OMP Column Selection, Gaussian Projection Sketching.
        *   Functions to calculate theoretical error bounds (user-provided custom bounds and standard literature bounds).
        *   A multi-panel plotting function (`plot_multi_panel_results`) designed to display results without individual legends on subplots, suitable for comparing multiple matrix types.
        *   Global styling configurations for plots.

2.  `run_experiment2.py`:
    *   This is the main executable script.
    *   It imports necessary functions from `matrix_product_experiment_lib.py`.
    *   Defines experiment parameters (matrix dimensions, range of k values, number of trials).
    *   Sets up a dictionary of matrix generators to be tested.
    *   Orchestrates the execution of experiments:
        *   Generates matrices.
        *   Calculates the Rho_G metric.
        *   Runs various approximation algorithms for different k values.
        *   Computes theoretical bounds.
        *   Collects and organizes results.
    *   Calls the plotting function from the library to generate and save multi-panel comparison plots for predefined groups of matrix types.

### SETUP / REQUIREMENTS
1.  **Python**: Python 3.7 or higher is recommended.
2.  **Core Libraries**:
    *   NumPy: For numerical operations (`pip install numpy`)
    *   SciPy: For scientific and technical computing, including sparse matrices and SVD (`pip install scipy`)
    *   Matplotlib: For plotting (`pip install matplotlib`)
3.  **Optional Library (for specific bounds)**:
    *   CVXPY: For solving convex optimization problems, used in one of the "Your Bound (QP CVXPY Best)" calculations. If not installed, the script will issue a warning and skip this specific bound calculation, but other functionalities will remain unaffected. (`pip install cvxpy`)

Ensure these libraries are installed in your Python environment.

### EXECUTION INSTRUCTIONS
1.  Place both `matrix_product_experiment_lib.py` and `run_experiment2.py` in the same directory.
2.  Open a terminal or command prompt.
3.  Navigate to the directory where you saved the files.
4.  Run the main script using the command:
    `python run_experiment2.py`

The script will print progress updates to the console, including the current matrix type being processed, calculated Rho_G values, and notifications when plots are saved.

### CONFIGURATION
Key parameters for the experiments can be modified directly within the `run_experiment2.py` script, primarily in the `if __name__ == "__main__":` block:

*   **Matrix Dimensions**:
    *   `n_exp`: Number of rows in matrix A.
    *   `m_exp`: The common dimension (number of columns in A and B). This is the dimension from which `k` features/columns are selected or sketched.
    *   `p_exp`: Number of rows in matrix B.
*   **Sparsity/Sketch Size (k)**:
    *   `k_ratio_start`, `k_ratio_end`: Define the range of k as a ratio of `m_exp`.
    *   `num_k_steps`: Number of k values to test within this range.
*   **Experiment Control**:
    *   `n_trials_exp`: Number of random trials to average for stochastic algorithms (e.g., Uniform, Leverage Score, Gaussian Projection) at each k value.
    *   `main_seed`: Master seed for reproducibility.
*   **Output**:
    *   `plot_directory`: Name of the sub-directory where generated plots will be saved. This directory will be created if it doesn't exist.
*   **Matrix Generators**:
    *   The `matrix_generators_dict` in `run_experiment2.py` defines which matrix types are tested. You can comment out or add entries to this dictionary to select different generators defined in `matrix_product_experiment_lib.py`. Each key is a descriptive name, and the value is a lambda function calling the corresponding generator from the library.
*   **Plot Grouping**:
    *   The `plot_groups` list in `run_experiment2.py` defines how matrix types are grouped into multi-panel plots. Modify this to change the content and organization of the output figures.

### OUTPUT
1.  **Console Output**:
    *   Configuration parameters used for the run.
    *   Progress updates for each matrix type and experiment.
    *   Calculated Rho_G values for each matrix pair.
    *   Path to the saved plot files.
    *   Total execution time.
2.  **Plots**:
    *   PNG image files saved in the specified `plot_directory`.
    *   Each plot is a multi-panel figure, with each panel representing a different matrix type within a defined group.
    *   The x-axis typically represents `k / m_exp` (sparsity ratio or sketch size relative to the common dimension).
    *   The y-axis represents the relative Frobenius norm error `||AB.T - C W.T||_F / ||AB.T||_F` for algorithms, or the bound value (also normalized by `||AB.T||_F`) for theoretical bounds.
    *   Plots are generated with NO LEGENDS on individual subplots to maintain clarity in multi-panel views. Refer to the `IMPROVED_STYLES` dictionary in `matrix_product_approximations_exp2.py` to identify lines by their style (color, marker, linestyle).
    *   The title of each subplot indicates the matrix type and its approximate Rho_G value.

### INTERPRETING RESULTS
*   **Lower Relative Error**: Indicates better approximation by an algorithm.
*   **Bounds**: Theoretical bounds provide an upper limit on the expected error. Tighter bounds (closer to the actual errors) are more informative.
*   **Rho_G (ρ_G)**: This metric, calculated as `trace(A.T A * B.T B) / sum(A.T A * B.T B)`, provides insight into the structure of the matrices and can influence the performance of certain bounds and algorithms. Values closer to 1 suggest diagonal dominance in the Hadamard product of Gram matrices, while values closer to `1/n` (or `1/m_exp` in this context) suggest a more uniform distribution of energy.
*   **Comparison Across Matrix Types**: Observe how different algorithms and bounds perform under various data distributions (e.g., uniform, Gaussian, with repeated columns, non-linearities).

### CODE OVERVIEW

*   **`matrix_product_approximations_exp2.py`**:
    *   **Matrix Generation**: Functions like `generate_matrices_uniform`, `generate_matrices_gaussian_cancellation`, etc., create pairs of matrices (A, B) with specific properties.
    *   **Sampling/Sketching Algorithms**:
        *   `uniform_sampling`, `rmm_sampling` (Leverage Score), `deterministic_sampling`: Implement standard column selection methods.
        *   `run_greedy_selection_omp`, `run_gaussian_projection`: Implement more advanced/custom approximation techniques.
    *   **Bound Calculation**:
        *   `compute_theoretical_bounds`: Calculates user-provided custom bounds (Binary, QP Analytical, QP CVXPY Best).
        *   `compute_standard_bounds`: Calculates bounds like Leverage Score Expectation and a simple Sketching bound.
        *   `compute_all_bounds_orchestrator`: Manages the calculation of all bounds for a given k.
    *   **Experiment Logic**: `run_algorithm_experiments` handles running algorithms for a range of k values and computing their errors.
    *   **Plotting**: `plot_multi_panel_results` generates the final output figures.

*   **`run_experiment2.py`**:
    *   Sets up global parameters for the entire suite of experiments.
    *   Iterates through the selected `matrix_generators_dict`.
    *   For each matrix type:
        *   Generates A and B.
        *   Calls `run_algorithm_experiments` to get empirical errors.
        *   Calls `compute_all_bounds_orchestrator` to get theoretical bound values.
        *   Stores all results.
    *   After all matrix types are processed, it iterates through `plot_groups` and calls `plot_multi_panel_results` to visualize the grouped results.

### CUSTOMIZATION
*   **Adding New Matrix Generators**:
    1.  Implement a new generator function in `matrix_product_approximations_exp2.py` (e.g., `generate_my_new_matrix_type(...)`).
    2.  In `run_experiment2.py`, add an entry to `matrix_generators_dict` to include it in experiments.
*   **Adding New Algorithms/Bounds**:
    1.  Implement the algorithm/bound calculation function in `matrix_product_approximations_exp2.py`.
    2.  Integrate its calling logic into `run_algorithm_experiments` or `compute_all_bounds_orchestrator`.
    3.  Add a corresponding style entry to `IMPROVED_STYLES` in `matrix_product_approximations_exp2.py` for plotting.
    4.  Ensure the results are collected and passed to the plotting function.

================================================================================